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Abstract 

INTERTAIL, the computer program which implements an approach 
to tailored testing outlined by Cliff (1975), was examined with error- 
less data in several Monte Carlo studies. Three replications of 
ea^h cell of a 3 x 3 table with 10, 20 and 40 items and persons were 
analyzed. Mean rank correlation coefficients between the true 
order, specified by pre-assigned random numbers, and the computed 
order produced by the program ranged from .93 to .99. Other effi- 
ciency measu'res are reported which also support the theory as a 
general measuring and ordering technique. Based on these results, 
program modifications are proposed as well as a data scheme which 
is to be used in further system testing. 
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Thls report describes a computer program designed to Implement 
the computer-Interactive testing procedure proposed by Cliff (1975). 
The theory starts from the observation that the ordinary Item score 
matrix In which a correct response Is recorded as a 1 and an Incor- 
rect one as a 0 can be regarded as an adjacency matrix Indicating 
the relations between a set of Items and a set of persons. From 
that point of view the matrix should be extended so that It Is 
Items-plus-persons-by-ltems-plus-persons. Then there are four sec- 
tions. These consist of one which Is the ordinary Item-by-person 
rights matrix, a corresponding person-by-ltem section which is the 
wrongs matrix, an Item-by-ltem section which is all zero, and a 
person-by-person section which is similarly all zero. Now the inter- 
pretation is that a 1 indicates that the row element dominates the 
column element, regardless of which is item and which is person. 
The person-person and item-item sections are all zero because these 
relations are not observed directly. 

He goes on to show that, if the data corresponds to the jjequire- 
ments for a Guttman scale, the supermatrix is equivalent to a type 
of incomplete adjacency matrix which records the relations among the 
members of a semiorder . Moreover, the employment of a kind of Boolean 
matrix algebra can be used to complete the matrix of relations. These 
"missing" relations are those in the item-item and person-person 
sections of the matrix. That is, the person-person and. item-item 
order relations can be determined as implications of the person-item 
responses if the items form a Guttman scale. 

This process is relevant to tailored testing because it applies 
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to incomplete score matrices as well as complete ones. In fact, 

all that is necessary to deduce the complete score matrix and the 

complete joint order of persons and items is the response of each 

person to the easiest item he would fail and the hardest one he 

would pass. If these are known, repealed application of the Boolean 

matrix algebra will succeed in completing the matrix of responses. 

The workings of the matrix process goes as follows: The matrix 

S is the item-person matrix, s^^ = 1 if person i passes item j and 

zero otherwise^ i.e., a rights matrix. The matrix S is the wrong 

matrix with s'^j = 1 if i fails j and zero otherwise. For complete 

data, S and s' are complementary, but for tailored tests some elements 

can be zero in both. Now compute N = S'S; n^^^ will equal the number 

of persons who failed item j and passed k. Similarly, we compute 

X = SS'; x^^ will equal the number of items that person i passed 
ih 

but h failed. If the items are a Guttman scale, then either n^^^ or 
n^j will be zero, and similarly for x^^^ and x^^. That is a conse- 
quence of the Guttman form of the score matrix; in a Guttman scale 
where there is an item that i passes and h fails, it is never the 
case that there is a different item which h passes but i fails. 

i 

In a Guttman scale, then all that need be recorded is that i domi- 
nates or defeats h (1) or not (0). It is this simple sense in which 
we speak of the matrix algebra being Boolean; only the 1 or 0 is 
recorded, not the actual numbers. 

This process is illustrated in Figures la j and lb. From some 
points of view, it is simpler to treat S and S? as elements of a 
supermatrix A. Then N and X are sections of the supermatrix A^, 
as shown in the figures. 
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Figure la. Full adjacency matrix for two-set dominance 
data. 
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In the tailored case, the logic works as follows: Suppose there 
is an item which person i passes but h fails. Then i dominates (is 
smarter or more knowledgeable than) h. Suppose there is another item 
which i himself fails. Then we need not present it to h because under 
the Guttmaiji assumption the latter must fail it too. Similarly, if there 
is an item which h passes, we need not present it to i because he must 
pass it. In fact, such chains of inference can be extended over an 
interlocked series of persons and items to lead finally to a conclusion 
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that 1 will answer j correctly (or incorrectly). This implica tional 



process can be symbolized quite simply with the Boolean matrix algebra. 



All that is necessary is to have the S and S matrices incomplete in 



the sense that s.. * s,. = 0 for some of the elements, and to have the 

A supermatrix raised to powers higher than 2. Suppose we have the results 



for all of the persons on some of the items. Then A"^ will contain person- 



person and item-item dominances that are implied by those responses. 



If A^ is then computed, what it contains is the person-item responses 



that are implied by the relations in A^ . If A^ is computed, it contains 



the dominances that are implied at one remove; similarily, A^ will con- 



tain the responses that are implied, but implied indirectly. This process 



can continue to as high a pov/er as seems useful. It is illustrated in 



Figure 2. 
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Figure 2. 



Incomplete adjacency matrix A and its powers. 
(Entries with asterisks represent item-person 
pairs which are not observed directly.) 



There Is a major difficulty with the foregoing. This is that 
test Items do not form Guttman scales, but are at best quasi-scales. 



result in erroneous or contradictory implications. Some method of 
making the process relatively insensitive to the probabilistic nature 
of test response is needed. Therefore, while the computer program 



additional feature is added. This is that while all the matrices 
are stored ia the dichotomous form, the calculations are carried out 
numerically and certain quantitative tests are performed before an 
entry is recorded in a product matrix as a 1 or a 0. This approach 
is taken in order to reduce the effect of non-transitivity in the 
data. 

For example, in computing n^j^ from the binary response matrices, 
the first operation is 



Then the symmetric element n. is also computed in the same way* 



Consequently, the direct application of the f oregoi xg procedure can 



essentially follows the algebraic procedure described above, an 



n 



jk " i ®ij®ik* 



* E 



n, 



kj " i ^ik^ij- 



Then the following ratio is computed 
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This may be recognized as the ratio for correlated proportions 

(Guilford and Fruchter^ 1974). Then there is a specified criterion 

value for Zjj^» and n^j^ i6 recorded as 1 if the obtained ratio exceeds 

that criterion, n, ^ is recorded as 1 if -z., exceeds it, and both 
kj Jk 

are zero if neither is the case. Thus j dominates k only if answered 
vnrongly by "significantly" more persons. Some other function of 

it it 

n^j^ and nj^j could obviously be used for this purpose, but this is the 
approach used here. The same procedure is used for person dominance. 

A second major problem is that of matching the individual with 
an appropriate item when there is only partial information about both. 
Our approach to it follows from the conceptualization of the relevant 
score matrix as items-plus-persons by items-plus-persons. The tradi- 
tional total score for a person is his sum across a row of the score 
matrix; the traditional item difficulty (actually, it is an easiness) 
is the sum down a column of it, divided by n. The present formulation 
extends these in several ways. First, we consider both the rights 
score and the wrongs score since the matrix is incomplete and there 
is information in both. Second, implied as well as directly observed 
relations are included in the scores, including the item-item and 
person-person relations. Finally, both persons and items are treated 
in exactly the same way; for both, a count is made of the number of 
items and persons it dominates and is dominated by, both directly 
and by implication. Then items and persons are matched on the basis 
of their net dominance scores. 

This is rather easy to formulate symbolically using the present 
notation. Given the matrix G which is the Boolean sum of successive 
powers of the data matrix at a given point, the "wins" are the number 
of Is in a given row and the "losses" are the number in the same column. 
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Then a not dominance score for an item or person is simply the 
difference between the two. On the next round, a person is given 
the item with the net win score nearest to his, although any function 
of the two could be used in the decision. 

The foregoing describes the overall basis of the program and 
the two major heuristics that it employs for robustness and efficiency. 
The method of data storing and the method of computation of the power- 
ing process may also be worth commenting on. The program combines 
the fact of the binary nature of the data with the fact that current 
machines process data as words, in our case 32-bit ones. All of the 
binary matrices are stored as bits in words. For example, the two 
lines below give the rights and wrongs scores of a person i who has 
responded to 13 of the items on a 45-item test, passing six and 
failing seven. 

: 00000000000000000010001001000100 000010000001000000000000000000000 
: OOOOOOOOOOIOOOOOIOOOOOOIOOOIOOOO 010000100100000000000000000000000 

The Is in the upper row correspond to the positions of the items 
he passed and those in the lower to those he failed. Thus his com- 
plete set of responses requires only four storage locations, and 
space requirements growas x/32 instead of x. 

This binary storage feature is also an advantage in the powering 
process. For example, suppose we wish to compute *oc^^^, the number 
of items i gets right and h wrong. Then s^^ is combined v;ith Sj^^ 
using the "and" function, as illustrated on the following page. 
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s^j : 00000000000000000010001001000100 00001000000100000000000000000000 

••and" 

sj^j : 00000000000100001000000000000100 00001000000000000000000000000000 

gives 

00000000000000000000000000000100 00001000000000000000000000000000 

Thus is this case x,, = 2 because there are shovm to be two items 

In 

passed by i and failed by h. In order to find this number, the words 
must be unpacked to find the number of non-zero elements, which 
means that some of the time gained by carrying out the arithmetic 
32 steps at a time is lost, but the gain is substantial, particularly 
since many times x^^ will be zero. The use of this binary storage 
in computation necessitates routines to perform the required packing 
and unpacking and utilization of special logical functions. Our 
program uses the logical functions of a local system (IBM 370/158). 
Such functions are either available for most machines or readily 
written by the ^stem programmers. 

An additional principle used is that of an expanding item pool. 
The idea is that the program initially works with only a subset of 
the available items. Periodically, the consistency of the items 
used so far is examined and those that appear to be less consistent 
than some input value are replaced by others from the pool. 

The program described below is built around these principles 
and techniques. It is programmed as a rather !mple main program 
with a number of subroutines. In addition to handling the computational 
and decisional aspects, these allow for different modes of operation 
(Monte Carlo, simulation on stored data, and true interactive) and 
provide for various choices concerning operating parameters, output 
formats, and the like. 
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OVERVIEW OF TAILORED TESTING PROGRAM - IN^^^'^A^L 



The current version of INTERTAIL requires ^ha^ user sf^^Xfy 

10 Initial parameters which pertain to the size oi study* 

various critical values, and the mode of opeta^^o^^ ^hen, a^cordj^^g 

to whether the program is in interactive or Motive ^^^lo mode* P^^^on- 

^e 

item pairs are established. In the interactive * Vhlcb ^ 
tryout mode, the pairs are supplied by the uaei^> w'^^'^^ ±^ ^ MoH^^ 
Carlo run they are produced by the computer ati<I th^^ ^^^^ pai^^^ 
random. After this beginning set-up phase, tnt^ra^^'^V^ pair^^^ 
matrix powering sequence begins. The major st^P^ ^^^^Ived are ^^t 
up as subroutintes. They are described below ^tii ^^^^ charts 
the main program and subroutines are given in t^^e ^^^^^\dix. 

1 - Subroutine INTERACT 

In the interactive mode person-item palr^ prcS^^^^^ 
and the user specifies the dominance betve^^ For a Monte 

Carlo, determining dominances is done auto^^^^^^'^ly by coioP^Jring 
the previously assigned random numbers of per^^^ ^^id ltem> ^ild 
assigning, a J'win" to the larger. The reeoJ^^ ^ Hns an^ l°^^es 
for each item and person is then logged ifi on^ sevef^^ 
matrices. Thus for the case of an item vi^ o^^^ ^ pera^'^* ^'^^ 
item win matrix records a 1 for this itei^ a^e^ pet^oti' 
Slmilarily, in the person losses matrix ^ J 1^ ^^^orded 
the loss to this item for this person. 

2 - Subroutine SQUARE 

The items for which new relations h^v^ b^^^ ^stabl^^^^*^ 
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are tested against all other items in order to determine the 
item-item dominances. Essentially this involves comparing the 
wins and losses between two items and testing the ratio of 
one item's wins to the otheiA wins. If the ratio is greater 
than some critical value then the first item dominates the 
second: if the ratio is less than -l.O times the critical 
ratio then the second item dominates the first. 

3 - Subroutine DIPLY 

The process of powering the item-by-item wins and losses 
matrices is accomplished as many times as the user specifies. 
Each item of the group for which new relations have been col- 
lected, IR, is compared against all other items in the item 
pool, J. The number of dominances of item IR over J and also 
of J over IR is computied. If neither item IR nor item J currently 
have enough wins over the other, then any previous order 
between the two is removed. If however one of the two items 
has beaten the other frequently enough as specified by the user, 
then the count ^ of item-item wins and losses are incremented 
and entries are added to the appropriate locations of the 
Item-Item dominance matrices. 

4 - Subroutine REFLCE 

If all the available items are. in use, this section is 
skipped. However, if that is not the case each item is examined 
for its current consistency. All persons are checked against 
an item and the number of inconsistencies are counted (an incon- 
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sistency is defined as I > P and P > I ) . If the ratio of the 
J X y y X 

number of inconsistencies for an item divided by the number of persons 
is less than some user-specified value > the item is replaced by another 
from the item pool. 

5 - Subroutine MULT 

Summary implications are then computed for all relations between 

the items and persons. Each person is tested against each item by 

comparing the number of wins of the person over the item, Nl, with 

the number of wins of the item over the person, N2. The ratio of 

Nl - N2 is tested against a user-specified critical value. If the 
/N1 + N2 

ratio* is greater, a dominance is recorded for the person over the 
item; if the ratio is less than the negative user-supplied value!, a 
dominance is recorded for the item over the person. 

6 - Subroutine COUNT 

Each person is compared to all other persons to determine the 
current order among them. This is accomplished by accumulating person 
X wins against person Y losses and person Y wins against person X 
losses. These numbers are tested in the ratio X - Y . If the ratio 

is greater than a user-supplied value, the win is added to X*s total, 
loss to Y*s total; if the ratio is less than negative user-supplied 
value, a win is added to Y's total and a loss to X*s total. 
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7 - Subroutine OUTPUT 

The current status of the person and item orders and 
the contents of the super-matrix can be printed if opted. If 
the person-item orders are requested then they are ranked and 
printed according to person wins plus item wins minus the sum 
of person losses plus item losses (T = PV7 + IW - (PL + IL)). 
In the same way items are ranked by T = PW + IW - (PL + IL) . 
In addition, if they are requested, the person-item and item- 
iteiTi matrices are printed. 

8 - Subroutine SELECT 

If there are more relations to be gathered, the persons 
and items are matched as optimally as possible. This process 
first finds the person who has the fewest relations on the 
items, then locates the person with the next fewest, etc. An 
item is paired with a person by locating the item with the 
number of net person wins which is closest to the person's num- 
ber of net item wins. 

The above sequence, represented by steps 1 through 8, 
is repeated until the person-item super-matrix is filled 
(i.e., when a relation exists for each person on each item). 
At that point the final information can be printed by means 
of subroutine OUTPUT. Then, unless the user specifies that 
another run is to be started, the program terminates. 
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Monte Carlo Study 

In order to examine the efficacy of INTERTAIL a series of 

Monte Carlo runs were designed and carried out. The scheme which 

was implemented assigned random numbers to the items and persons 

at the beginning of the session. These numbers performed as measures 

of item difficulty or a person's ability, and by using them dominance 

could be assessed directly (i.e., if the random number assigned to 

person P- was greater than that which was assigned to item I , then 

was said to have answered I correctly, or P- > I ) . This is 
1 a ' 1 a 

essentially an errorless data technique because no allowance was 
made for chance factors or the interaction of ability and discri- 
mination levels. (Subsequent studi^,s are currently being planned 
which will test the system with random factors included in the data.) 

Parameter Definitions 

As currently written, INTERTAIL requires the user to specify 
9 parameters which are used in various sections of the program. In 
the completely interactive approach these parameters are obtained 
by means of the computer prompting the user for a certain option 
(eg. "INPUT NUMBER OF PERSONS") and recording the subsequent response. 
The actual values used in the present trials are given In brackets 
after the individual parameter descriptions and the acronyms used 
in the program appear in capital letters. The parameters are (1) NPER, 
the number of persons in the study [lO, 20, 40]]; (2) NTOT, the total 
number of items |lO, 20, 40^; (3) NITE^^, the number of items, less 
than or equal to NTOT, which is the subset of items actually in use 
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at one tlmejlO, 20, 4o| ; (4) NTIME, the number of cycles to be 
completed before an Item consistency check Is made In subroutine 
REPLAC QJ; (5) CONST, the absolute z value which must be surpassed 
when judging an item to be inconsistent ^-99]; (6) OPTD, the optimui 
difference in net wins between any item and person being paired for 
the next cycle Q^-Oj » (7) RATIO, the absolute z value which must 
be surpassed to define an item-person or item-item dominance |l.O^; 
(8) NCYC, the number of cycles which must elapse before the powering 
process begins ^sj ; (9) INITCY, the number of items presented to a 
person before entering the major iterative process |^lj . These para- 
meters were used to test the program with nine different sized scudies 
Each of the nine combinations of 10, 20, or 40 persons and items was 
replicated three times so that solution variability could be examined. 

Results 

The solution INTERTAIL produces is primarily a rank ordering 
of the items and persons, although other indices are also given. 
Therefore the principle concern is the correlation of the computed 
rank order, given at the end of a Monte Carlo, and the true rank 
order of the persons and items based on the initially assigned random 
numbers. The mean rank order correlation coefficients (Kendall's 
Tau) for the nine study sizes are shown in Table 1. They are close to 
unity, on the average. As can be seen there is a general tendency 
for studies involving larger numbers of relations to produce stronger 
correlations, presumably because the greater the number of possible 
pairs, the more likely the computation of a dominance relation between 
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TABLE 1 

Mean correlation coefficients between true 
order and computed order. 

Number of 
Items 





10 


20 


40 


10 


.94 


.97 


.93 


Number of 2q 
Persons 


.95 


.98 


.98 


40 


.95 


.98 


.99 



any two entitles becomes (that Is between any Item-person, item-Item, 
or person-person) . 

It Is noteworthy that for this errorless data there are no per- 
fect correlations In any of the studies. This situation arises when 
the random assignment of numbers produces a true order such that two 
or more Items (or two or more persons) are adjacent to each other 
In the matrix. In such cases^ although the tied Items or persons 
are not necessarily out of order, the routine calculation of tau In 
effect penalizes any Inability to duplicate the true order exactly. 
In light of this consideration, these results were Interpreted as 
signifying that the program did essentially recapture the true order 
of the original matrices; that the coefficients were less than 1.0 
Indicates that an order among adjacent persons Is determined by chance 
when no Item can be used to fine-tune the dominance relationships. 

A second major Interest In this study concerned how many res- 
ponses needed to be produced relative to the total number of possible 
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relations. The ultimate value o£ a tailored approach lies In how 
much Information a single response will generate^ or how many possible 
relations are eliminated by the powering process. Table 2 shows the 

TABLE 2 

Mean percentage o£ possible relations 
accounted for by responses. 

Number o£ 
items 





10 


20 


40 


10 


.72 


.49 


.46 


Number of 2q 
Persons 


.58 


.44 


.34 


40 


.48 


.39 


.46 



mean percentage of possible relations which were accounted for by 
responses. The values ranged from a high of 72% to a low of 34% 
There Is again a general trend for larger studies to yield smaller 
percentages. Indicating the Increased efficiency of the technique 
with bigger problems. 

Figure 3 demonstrates the rates at which the various sized 
problems were solved (I.e., solved Indicating a relationship exists 
between each Item and person) . The nine studies are separated 
according to number of Items. Each plot reflects the program's 
orientation toward the person solution over the Item solution. In 
other words for cases where the number of persons and Items are not 
equal, a cycle Is defined as pairing each person with an Item and 
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not the converse. This approach requires that a single Item will 
be used more than once when the number of persons Is greater than 
the number of Items, and that some Items will not be used during a 
cycle when the number of persons Is less than the number of Items. 
Across all studies with the same number of Items there Is a consis- 
tent effect for fewer responses per person as the number of total 
relations Increases. Similarly the relative solution rates between 
10, 20 and 40 persons can be seen to be approximately the same. 

Finally, performance was also assessed In terms of central 
processing unit (CPU) time. CPU time Is the actual amount of time 
a computer system Is Involved with calculation or Institution of 
Input or output. Table 3 shows the average amount of CPU time In 

TABLE 3 



Mean CPU Time 

Number of 
Items 





10 


20 


40 


10 


3 


10.8 


64 


Number of 20 
Persons 


5.2 


18.7 


89.4 


40 


11.9 


36.6 


132.1 



seconds for the nine studies. In each case, as the maximum number 

of relations to be determined Increases, so does the amount of CPU time. 
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Discussion 

This preliminary examination of INTERTAIL involved errorless 
data and stimulus sets ranging in size from 100 to 1600 relations. 
The investigation gave a generally positive picture about the approach, 
for all indices followed rather orderly progression in solution speed 
and accuracy over the various study sizes. It has been suggested 
by Knuth (1973) that the minimum number of responses required to 
order N stimuli is log^N!, where in this context, N is number of 
persons and items. The program produced consistently close approxi- 
mations to the theoretical minimum over all study sizes. 

Two major issues arose as a result of the work to this point. 
The first involves the problem of a stopping rule. The current ver- 
sion of the program halts only after a relation has been recorded 
for each stimulus pair. Toward the end of any problem the net infor- 
mation to be gained from any one response tends to decrease, that is 
there is usually little impact on the final order from the last few 
relations. Consequently a rule could be developed which would termi- 
nate the program, even with some relations outstanding. This is 
because the time considerations involved with collecting and processing 
the last responses doesn't generally alter the person order, and 
could be elJjninated. The problem is a common one in iterative approaches, 
namely, how close is close enough, or what is to be gained for the 
final effort of gathering all relations. The issue is currently 
being analyzed in detail and will be specifically examined after the 
program has been tested with other data. 

The second allied problem encountered was in the area of stimulus 
set size, especially in the possibility of employing only a subset 
of the total item pool. The idea is that if the persons can be ordered 
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with fewer items than the totals substantial overall efficiency could 
be achieved by not needlessly handling all possible pairs. This 
could be approached from a sort of dynamic item pool concept which 
would begin with a distilled sample of items and thereafter add or 
delete items as their relative impact to the person order was assessed. 
With smaller numbers of pairs this saving would be only marginally 
important. However as the size of the stimulus set increases it 
appears likely that such a trimmed item pool would yield substantial 
savings in computer time and the number of questions asked each subject. 

The next phase will be to construct a second series of Monte 
Carlo studies to test the program with more realistic data. This 
segment will employ a model with measures corresponding to a subject's 
ability, and item characteristics of discrimination, difficulty and 
a guessing probability. Results will be used to make decisions about 
the stopping rule problem and the idea of a reduced item pool. 
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